Attnention is all you need

One of the Lessons from deep learning is that the more we let the neural network to learn from data, the better it performs, as long as there is enough data.

Attention mechanism and Transformer model may be a good example. Instead of adding biases (however reasonable they are), just adding attention tends to work better if we have more data.

https://wonjae.kim/blog/2021/Exploiting_Contemporary_ML/